AITopics | adaptive allocation rule

Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards

Neural Information Processing SystemsFeb-8-2026, 12:46:35 GMT

Forouranalysis wedevise several concentration results forMarkovchains, including amaximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > Hungary > Budapest > Budapest (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)
Information Technology > Data Science > Data Mining > Big Data (0.31)

Add feedback

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Neural Information Processing SystemsDec-24-2025, 02:01:15 GMT

We study an extension of the classic stochastic multi-armed bandit problem which involves multiple plays and Markovian rewards in the rested bandits setting. In order to tackle this problem we consider an adaptive allocation rule which at each stage combines the information from the sample means of all the arms, with the Kullback-Leibler upper confidence bound of a single arm which is selected in round-robin way. For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal. For our analysis we devise several concentration results for Markov chains, including a maximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d.

finite-time analysis, optimal adaptive allocation, round-robin kullback-leibler upper confidence bound, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.96)
Information Technology > Artificial Intelligence > Machine Learning (0.85)

Add feedback

597c7b407a02cc0a92167e7a371eca25-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 23:59:09 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Data Science > Data Mining > Big Data (0.50)

Add feedback

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Neural Information Processing SystemsOct-10-2024, 06:44:19 GMT

We study an extension of the classic stochastic multi-armed bandit problem which involves multiple plays and Markovian rewards in the rested bandits setting. In order to tackle this problem we consider an adaptive allocation rule which at each stage combines the information from the sample means of all the arms, with the Kullback-Leibler upper confidence bound of a single arm which is selected in round-robin way. For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal. For our analysis we devise several concentration results for Markov chains, including a maximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d.

multiple play and markovian reward, optimal adaptive allocation, round-robin kullback-leibler upper confidence bound, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.91)

Add feedback

Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Moulos, Vrettos

arXiv.org Machine LearningJan-30-2020

We study an extension of the classic stochastic multi-armed bandit problem which involves Markovian rewards and multiple plays. In order to tackle this problem we consider an index based adaptive allocation rule which at each stage combines calculations of sample means, and of upper confidence bounds, using the Kullback-Leibler divergence rate, for the stationary expected reward of Markovian arms. For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal. For our analysis we devise several concentration results for Markov chains, including a maximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish, asymptotically optimal, finite-time guarantees for the case of multiple plays, and IID rewards drawn from a one-parameter exponential family of probability densities.

adaptive allocation rule, allocation rule, markov chain, (14 more...)

arXiv.org Machine Learning

2001.11201

Country:

Europe > Hungary > Budapest > Budapest (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.78)

Add feedback

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits

Moulos, Vrettos

arXiv.org Machine LearningJan-5-2020

This paper develops a Hoeffding inequality for the partial sums null n k 1f ( X k), where { X k} k Z 0 is an irreducible Markov chain on a finite state space S, and f: S [ a, b] is a real-valued function. Our bound is simple, general, since it only assumes irreducibility and finiteness of the state space, and powerful. In order to demonstrate its usefulness we provide two applications in multi-armed bandit problems. The first is about identifying an approximately best Markovian arm, while the second is concerned with regret minimization in the context of Markovian bandits. 1 Introduction Let {X k} k Z 0 be a Markov chain on a finite state space S, with initial distribution q, and irreducible transition probability matrix P, governed by the probability law P q. Let π be its stationary distribution, and f: S [a,b ] be a real-valued function on the state space.

algorithm, inequality, markov chain, (11 more...)

arXiv.org Machine Learning

2001.01199

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Filters

Collaborating Authors

adaptive allocation rule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

597c7b407a02cc0a92167e7a371eca25-Paper.pdf

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits